dOpenCL: Towards Uniform Programming for Distributed Systems with Multi-Cores and GPUs
نویسندگان
چکیده
Modern computer systems are becoming distributed and heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. However, to program such systems, the user currently has to use a combination of several programming models (e.g., MPI with OpenCL or CUDA), which is difficult and error-prone. We present dOpenCL (distributed OpenCL) – a uniform approach to programming distributed systems with accelerators. Our approach is based on the OpenCL standard and it allows the user to run existing OpenCL applications unmodified in a heterogeneous distributed environment. The dOpenCL system also supports transparent execution of multiple OpenCL applications in one distributed, multi-user environment. We describe dOpenCL as an implementation of the OpenCL programming model on distributed systems, and we experimentally compare the performance of dOpenCL with MPI+OpenCL and standard OpenCL implementations.
منابع مشابه
Accelerating high-order WENO schemes using two heterogeneous GPUs
A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...
متن کاملTowards High-Level Programming for Systems with Many Cores
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-662-46823-4_10. Abstract. Application development for modern high-performance systems with many cores, i.e., comprising multiple Graphics Processing Units (GPUs) and multi-core CPUs, currently exploits low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs....
متن کاملEfficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کاملAn Adaptive Framework for Managing Heterogeneous Many-Core Clusters
The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as IBM Cell an...
متن کاملEfficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures
We present a new methodology for utilizing all CPU cores and all GPUs on a heterogeneous multicore and multi-GPU system to support matrix computations efficiently. Our approach is able to achieve four objectives: a high degree of parallelism, minimized synchronization, minimized communication, and load balancing. Our main idea is to treat the heterogeneous system as a distributed-memory machine...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013